CORRELATION VERSUS CURVE FITTING IN RESEARCH 
ON ACCIDENT PRONENESS: REPLY TO MARITZ 


MILTON L, BLUM anp ALEXANDER MINTZ 
City College of New York 


Maritz (3) states that the technique of correlating accident records 
in two successive periods is indispensable as evidence of accident 
proneness, He suggests that the fitting of theoretical Poisson and 
Negative Binomial distributions is not an adequate criterion of the 
absence or presence of differences in accident proneness in a group, 
There are certain weaknesses in this position, and a clarification is 
demanded. To reiterate the major points of our earlier paper (4): 

1. Personal accident proneness, a component of accident liability, has been 
overemphasized, ' 

2. This can be demonstrated by a method which reveals the extent to 
which accident records could be attributed to differences in accident liability, 
and this was found to be 20 per cent to 40 per cent of the total variance of 


accident records. 
3. This method is based on the use of univariate distributions. 


We agree with Maritz that a good Negative Binomial fit does not 
prove the existence of differences in accident proneness with mathe- 
matical certainty. We never said that it did. It is doubtful whether 
anything is ever proved with mathematical certainty in empirical 
sciences. 

The issue raised by Maritz is that the correlation technique is in- 
dispensable in the establishment of accident proneness and that the 
evidence from a univariate distribution is invalid. We strongly dis- 
agree. We shall show that the two techniques give much the same 
information and, therefore, either can be offered as evidence for acci- 
dent proneness, In fact, neither is wholly conclusive. Detailed histories 
of individuals’ accident careers should be better than either. 

Maritz views correlations as the “direct technique’ of establishing 
differences in accident proneness. His claim is that ‘“‘the most direct 
method of establishing proneness in a group of people all of whom 
ought to be exposed to the same environmental risk, consists of splitting 
a lengthy period of observation into two periods and correlating the 
frequency of accidents per individual for these two periods. This 
statistical technique is nearest to the psychological definition of acci- 
dent proneness.”’ He attempts to show that this technique may con- 
‘tradict conclusions based on univariate distributions. His evidence is 
based upon two hypothetical distributions and a previously unpublished 


413 


414 MILTON L. BLUM AND ALEXANDER MINTZ 


distribution by Adelstein. One of his hypothetical distributions re- 
sembles a chance pattern and yet yields a correlation in successive 
periods. The other is suggestive of differences in accident proneness but 
is uncorrelated in successive periods. Adelstein’s data, in his opinion, 
illustrate two simple chance distributions which are correlated with 
each other. 

Maritz’ two hypothetical distributions illustrate the mathematical 
possibility for two Poisson distributions to be correlated, and for a 
Negative Binomial Distribution to result from summation of two 
uncorrelated distributions. However, the occurrence of both kinds of 
bivariate distributions in the case of accident records is unlikely. The 
mathematical derivation of the bivariate correlated Poisson distribution 
which Maritz applies to accident records shows only that it can be 
approximated by drawing colored balls from enclosed boxes. We do 
not believe that such a distribution should be expected in the case of 
accident records. The only obvious meaning of a Poisson distribution 
of accident records is that it is satisfactorily explained by the assump- 
tion of equal and aonstant accident liability. Without such an assump- 
tion it looks like a result of an odd coincidence. If Poisson distributions 
of accidents are the results of equal liability, they should be uncor- 
related, and it is not clear what kind of combination of circumstances 
should lead to the expectation of Poisson distributions correlated in 
successive periods. 

In his other hypothetical distribution this lack of plausibility is 
obvious, For example, all six hypothetical individuals who had more 
than eighteen accidents each in one period had zero accidents in the 
other period. This could occur, but is hardly to be expected. In other 
words, in this example, Maritz proves that it is possible to obtain a 
correlation of —.11 by arranging numbers in a manner designed to 
obtain it. This is granted, but what does it prove about accidents? 

The only empirical material Maritz presents is the unpublished 
Adelstein data, and it must be regarded with more seriousness than his 
two hypothetical distributions, which are mathematically possible but 
highly improbable. Maritz claims that the examination of the uni- 
variate distributions suggests a ‘‘pure chance pattern,”’ but that the 
correlation between the accident records in the two periods reveals the 
existence of differences in accident proneness to the extent of a correla- 
tion of .29, The existence of the correlation is treated as something 
that could not have been foreseen in terms of the accident distributions 
in the two observation periods. Thus, the impression is created that the 
combined period had properties which were essentially different from 


CORRELATION VS. CURVE FITTING 415 


those of the shorter periods when considered alone. He bases his inter- 
pretation on the fact that three x? tests fail to reveal significant dif- 
ferences either between two Poisson distributions and Adelstein’s five- 
and six-year distributions, or between a bivariate correlated Poisson 
distribution and Adelstein’s scatter diagram for the two periods. 

In dealing with these data Maritz makes the common error of con- 
fusing the failure to disprove a hypothesis with its proof. He states: 
“Equation [1] was fitted to the observed data of Table III and the 
resulting test of goodness of fit gave for 7 df, P=.49. Hence it follows 
that the above data follow a correlated bivariate Poisson distribution” 
(p. 438). This second sentence does not follow from the first. The 
failure to disprove a hypothesis according to which the data for a com- 
bined period have properties (the .29 correlation) different from those 
of the constituent periods (viewed as exhibiting a simple chance pattern) 
is not the same as proof of it. A closer examination of the data for 
direct evidence of this type of heterogeneity fails to reveal anything 
convincing, and the opposite and theoretically more plausible hypoth- 
esis of essential homogeneity of the eleven-year observation period fits 
the data more closely than Maritz’ heterogeneity (i.e., bivariate cor- 
related Poisson) hypothesis. 

In terms of the properties of the accident distributions in the two 
consecutive periods, the most probable correlation between the accident 
records in these periods is not zero as Maritz implies, but .21. This 
is quite close to the observed .29, The estimate of a correlation of .21 
was arrived at by determining the estimated percentages of the accident 
records attributable to factors other than chance (18 per cent and 
24.4 per cent) and then computing their geometric mean, In determin- 
ing these percentages the formula (v—m) /v was used; the two variances 
were 1.485 and 1.370, respectively, and the two means were both 1.123. 

If the eleven-year period had no properties essentially different 
from those of the shorter period, it should be possible to construct 
theoretical scatter diagrams approximating the empirical one by 
utilizing statistics derived either from the two periods considered 
separately (without considering their correlation), or from either one 
of them taken alone, or from the total period taken as a whole. The 
form of the bivariate distribution chosen was the bivariate Negative 
Binomial, It is based on much the same assumptions as those made 
by Greenwood and Yule (1) in their derivation of a univariate unequal 
liability distribution, namely: 

Accident liability is distributed in people in accordance with a Pearson 
III curve. 


416 MILTON L. BLUM AND ALEXANDER MINTZ 


Accident liability of a person remains constant per unit of time throughout 
the two observation periods. 

Each particular degree of accident liability gives a simple chance (Poisson) 
distribution of accident records in each observation period (p. 279). 


Table I presents the Adelstein data together with a theoretical 
distribution constructed in accordance with these assumptions; the 
computation utilizes only the mean and the variance of the first period, 
and the fact that the second period lasted six years while the first one 
lasted five years. The formula for the theoretical frequency for the 
cell representing 7 accidents in the first, k accidents in the second 
is 


ee a= Tp +j + &) 
Vik cta+i1/ (pike + a+ 1)i 


in which a is the ratio of the two durations, and ¢ and p are two con- 
stants derived from the mean and variance of the first period, as 
follows: c=m/(u-m), p=m?/(u—m). (The derivation of the formula, 
which closely resembles that of Greenwood and Yule, will be published 
elsewhere.) 





TABLE I 


A COMPARISON OF ADELSTEIN’S ACTUAL ACCIDENT DATA WITH THE THEORETICAL 
BIVARIATE DISTRIBUTION COMPUTED FROM His First-PERtop DATA 
(m=1,123, v=1,370) 





Second Period 








First 

Pelion 0 1 2 3 4 5 6 7 Total 
0 |21016.3) 14(14.8)  8(8.0) 13.3) —(1.2) —(0.4) —(0.1) 44(44,1) 
1 {1702.3) 1243.4) 8(8.4) 34.1) 14.6), (0.6) —(0.2) 10.1) 42(40.7) 
2 68.6) 97,0) 2051) 202-7) 2.2) (0.5) (0.2) (0.1) 2122.4) 
3 441.9) 12.8) (2.3) 34.4): 10.7) —(0.3) —(0.1) 9(9.5) 
4 41(0.6)  3(1.0) —(0.9) —(0.6) —(@.3) —(0.1) 4(3.5) 
5 | —0@.2)  —(.3) —(@.3) 20.2) —(.1) 24.0) 
6 | —@.1) —(.1) —@.1) —(.1) 0(0.4) 

Totals | 46(37.0) 39(39.4) 24(25.1) 11(12.4) 4(5.1) 01.9) 00.6) -—«1(0.2)_ 122(121.7) 


The theoretical distribution appears to fit the data quite well. The 
x? computed was 8.743, df=10, P=.58. It follows that the correlation 
technique recommended by Maritz did not add anything significantly 
new to the information one could gather by examining one of Adelstein’s 
univariate distributions. 

In arguing for correlations and against properties of univariate 
distributions as grounds for assuming differences in accident proneness, 


CORRELATION VS. CURVE FITTING 417 


Maritz overlooks the fact that these two kinds of statistical measures 
are closely interrelated, The correlation between accident records in 
two periods can always be computed from the variances in these two 
periods and in the total period, according to the formula 


Vee Vi Ve 
2/ViV 2 


Similarly, the increase in accident variance when one combines two 
observation periods is an increasing function of the correlation between 
these periods, in accordance with the elementary formula Viye= Vit V2 
+2rs/V, V2. Inasmuch as this formula is exact, and inasmuch as every 
observation period has an early stage when the variance is smaller than 
the mean, a distribution cannot have a variance greater than the mean 
value unless there were positive correlations between successive periods 
somewhere in the past. Similarly, a Poisson distribution cannot result 
unless the accident records in the subdivisions of the observation period 
were uncorrelated or unless the effects of positive and negative correla- 
tions cancelled each other. If an examination of two univariate distri- 
butions in two successive periods suggests something markedly different 
about the existence of accident proneness, compared to the correlation 
between these periods, the interperiod correlations must have similarly 
changed in the past. Maritz’ hypothetical distributions could be used 
just as readily in arguing against the use of correlations in accident 
research as against the use of variances. The only advantage of a 
correlation lies in the fact that it enables one to tell a particular time 
when the variance of a set of accident records has risen at a rate beyond 
chance expectation. 

A further consideration with reference to the correlational technique 
is that it presents certain practical difficulties. Accident proneness is a 
problem for industry as well as for the theoretical statistician, From 
the point of view of industry it is often difficult to acquire data on the 
same individual for two successive periods. Those with high accident 
rates in the first period are likely not to be found in the second period, 
They may be dismissed or resign from their jobs, not to mention being 
hospitalized or dead. The study reported by Kerr (2) is typical of 
practical problems confronting industry. The major effort is to reduce 
accidents, not to wait for successive periods. 


SUMMARY 


1. The hypothetical distributions presented by Maritz are mathe- 
matically possible and demonstrate the lack of mathematical certainty 


418 MILTON L. BLUM AND ALEXANDER MINTZ 


of inference from empirical data, but such distributions are not likely 
to be encountered in practice. 

2. Correlational research on accident proneness is legitimate, but 
inferences about accident proneness drawn from correlations are not 
more certain than inferences drawn from the fitting of distributions. 

3. Correlational research is not always feasible for practical reasons. 
In any event, it is not indispensable with reference to establishing 
accident proneness. 


BIBLIOGRAPHY 

1, GREENWoop, M., & YuLe, G. U. An 3, Maritz, J.S. On the validity of infer- 
enquiry into the nature of frequency ences drawn from the fitting of 
distributions representative of mul- Poisson and Negative Binomial dis- 
tiple happenings, with particular tributions to observed accident data, 
reference to the occurrence of mul- Psychol. Bull., 1950, 47, 434-443, 
tiple attacks of disease or of repeated 4. Mintz, A., & BLuM, M. L. An exam- 
accidents. J. roy. statist. Soc., 1920, ination of the accident proneness 
83, 255-279, concept. J. appl. Psychol., 1949, 33, 

2, Kerr, W. A. Accident proneness of 195-211. 


factory departments. J. appl. Psy- 
chol., 1950, 34, 167-170. Received November 18, 1950, 


